Fast Feature Selection Using Fractal Dimension

نویسندگان

  • Caetano Traina
  • Agma J. M. Traina
  • Leejay Wu
  • Christos Faloutsos
چکیده

Dimensionality curse and dimensionality reduction are two issues that have retained high interest for data mining, machine learning, multimedia indexing, and clustering. We present a fast, scalable algorithm to quickly select the most important attributes (dimensions) for a given set of n-dimensional vectors. In contrast to older methods, our method has the following desirable properties: (a) it does not do rotation of attributes, thus leading to easy interpretation of the resulting attributes; (b) it can spot attributes that have nonlinear correlations; (c) it requires a constant number of passes over the dataset; (d) it gives a good estimate on how many attributes we should keep. The idea is to use the ‘fractal’ dimension of a dataset as a good approximation of its intrinsic dimension, and to drop attributes that do not affect it. We applied our method on real and synthetic datasets, where it gave fast and good results. 1 Introduction and Motivation When managing the increasing volume of data which is generated by the organizations, a question which frequently arises is: “what part of this data is really relevant to be kept?”. Notice that usually the relations of the database have many attributes which are correlated with the others. Attribute selection is a classic goal, as well as battling the “dimensionality curse” [Berchtold_1998] [Pagel_2000]. A careful chosen subset of attributes improves the performance and efficacy of a variety of algorithms. This is particularly true with redundant data, as many datasets can largely be well-approximated in fewer dimensions. This can also be seen as a way to compress data, as only the attributes which maintain the essential characteristics of the data are kept [Fayyad_1998]. In this paper we introduce a novel technique that can discover how many attributes are significant to characterize a dataset. We also present a fast, scalable algorithm to quickly select the most significant attributes of a dataset. In contrast to other methods, such as Singular Value Decomposition (SVD) [Faloutsos_1996], our method has the following desirable properties: (a) it does not rotate attributes, leading to easy interpretation of the resulting attributes; (b) it can spot attributes that have nonlinear and even non-polynomial correlations; (c) it is linear on the number of objects in the dataset;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved algorithm for feature selection using fractal dimension

Dimensionality reduction is an important issue in data mining and machine learning. Traina[1] proposed a feature selection algorithm to select the most important attributes for a given set of n-dimensional vectors based on correlation fractal dimension. The author used a kind of multi-dimensional “quad-tree” structure to compute the fractal dimension. Inspired by his work, we propose a new and ...

متن کامل

An Adaptive Segmentation Method Using Fractal Dimension and Wavelet Transform

In analyzing a signal, especially a non-stationary signal, it is often necessary the desired signal to be segmented into small epochs. Segmentation can be performed by splitting the signal at time instances where signal amplitude or frequency change. In this paper, the signal is initially decomposed into signals with different frequency bands using wavelet transform. Then, fractal dimension of ...

متن کامل

An Adaptive Segmentation Method Using Fractal Dimension and Wavelet Transform

In analyzing a signal, especially a non-stationary signal, it is often necessary the desired signal to be segmented into small epochs. Segmentation can be performed by splitting the signal at time instances where signal amplitude or frequency change. In this paper, the signal is initially decomposed into signals with different frequency bands using wavelet transform. Then, fractal dimension of ...

متن کامل

Adaptive Segmentation with Optimal Window Length Scheme using Fractal Dimension and Wavelet Transform

In many signal processing applications, such as EEG analysis, the non-stationary signal is often required to be segmented into small epochs. This is accomplished by drawing the boundaries of signal at time instances where its statistical characteristics, such as amplitude and/or frequency, change. In the proposed method, the original signal is initially decomposed into signals with different fr...

متن کامل

ارائه یک روش برچسب ‌گذاری سیگنال‎های مغزی به‎منظور طبقه‎بندی حالت‎های مختلف بیهوشی

 Aims and background:    This    study    develops    a    computational    framework    for    the    classification    of    different    anesthesia    states,    including    awake,    moderate    anesthesia,    and    general    anesthesia,    using    electroencephalography    (EEG)    signals    and    peripheral    parameters. Materials and Methods: The    proposed    method    proposes ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000